Search CORE

1,130 research outputs found

Three-Dimensional Phylogeny Explorer: Distinguishing paralogs, lateral transfer, and violation of "molecular clock" assumption with 3D visualization

Author: AJ Saldanha
Christopher Lee
CM Zmasek
CS Parr
DL Swofford
DL Wheeler
EV Koonin
G Trooskens
JD Retief
M Stallmann
MJ Sanderson
Namshin Kim
PL Lott
R Chenna
RD Page
RL Tatusov
RL Tatusov
RL Tatusov
RL Tatusov
S Kumar
SW Graham
Y Zhai
Z Du
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background Construction and interpretation of phylogenetic trees has been a major research topic for understanding the evolution of genes. Increases in sequence data and complexity are creating a need for more powerful and insightful tree visualization tools. Results We have developed 3D Phylogeny Explorer (3DPE), a novel phylogeny tree viewer that maps trees onto three spatial axes (species on the X-axis; paralogs on Z; evolutionary distance on Y), enabling one to distinguish at a glance evolutionary features such as speciation; gene duplication and paralog evolution; lateral gene transfer; and violation of the "molecular clock" assumption. Users can input any tree on the online 3DPE, then rotate, scroll, rescale, and explore it interactively as "live" 3D views. All objects in 3DPE are clickable to display subtrees, connectivity path highlighting, sequence alignments, and gene summary views, and etc. To illustrate the value of this visualization approach for microbial genomes, we also generated 3D phylogeny analyses for all clusters from the public COG database. We constructed tree views using well-established methods and graph algorithms. We used Scientific Python to generate VRML2 3D views viewable in any web browser. Conclusion 3DPE provides a novel phylogenetic tree projection method into 3D space and its web-based implementation with live 3D features for reconstruction of phylogenetic trees of COG database.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots

Author: A Stamatakis
Aaron Gallagher
D Dalevi
D McDonald
E Bachoore
Frederick A Matsen
HL Bodlaender
J Hein
M Price
O Ponta
R Tatusov
S Moran
S Moran
VB Yap
Publication venue
Publication date: 01/10/2011
Field of study

Although taxonomy is often used informally to evaluate the results of phylogenetic inference and find the root of phylogenetic trees, algorithmic methods to do so are lacking. In this paper we formalize these procedures and develop algorithms to solve the relevant problems. In particular, we introduce a new algorithm that solves a "subcoloring" problem for expressing the difference between the taxonomy and phylogeny at a given rank. This algorithm improves upon the current best algorithm in terms of asymptotic complexity for the parameter regime of interest; we also describe a branch-and-bound algorithm that saves orders of magnitude in computation on real data sets. We also develop a formalism and an algorithm for rooting phylogenetic trees according to a taxonomy. All of these algorithms are implemented in freely-available software.Comment: Version submitted to Algorithms for Molecular Biology. A number of fixes from previous versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

On strongly chordal graphs that are not leaf powers

Author: A Brandstädt
A Brandstädt
B Shutters
D Fulkerson
E Bibelnieks
H-J Bandelt
JP Spinrad
L Li
M Farber
M Lafond
M Steel
N Nishimura
R Nevries
R Nevries
R Paige
RL Tatusov
T Calamoneri
V Berry
W Kennedy
Publication venue
Publication date: 02/07/2017
Field of study

A common task in phylogenetics is to find an evolutionary tree representing proximity relationships between species. This motivates the notion of leaf powers: a graph G = (V, E) is a leaf power if there exist a tree T on leafset V and a threshold k such that uv is an edge if and only if the distance between u and v in T is at most k. Characterizing leaf powers is a challenging open problem, along with determining the complexity of their recognition. This is in part due to the fact that few graphs are known to not be leaf powers, as such graphs are difficult to construct. Recently, Nevries and Rosenke asked if leaf powers could be characterized by strong chordality and a finite set of forbidden subgraphs. In this paper, we provide a negative answer to this question, by exhibiting an infinite family \G of (minimal) strongly chordal graphs that are not leaf powers. During the process, we establish a connection between leaf powers, alternating cycles and quartet compatibility. We also show that deciding if a chordal graph is \G-free is NP-complete, which may provide insight on the complexity of the leaf power recognition problem

arXiv.org e-Print Archive

Crossref

IsoBase: a database of functionally related proteins across PPI networks

Author: B. Berger
C.-S. Liao
Chen
D. Park
Koonin
M. Baym
O'Brien
R. Singh
Salwinski
Sharan
Tatusov
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2010
Field of study

We describe IsoBase, a database identifying functionally related proteins, across five major eukaryotic model organisms: Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus and Homo Sapiens. Nearly all existing algorithms for orthology detection are based on sequence comparison. Although these have been successful in orthology prediction to some extent, we seek to go beyond these methods by the integration of sequence data and protein–protein interaction (PPI) networks to help in identifying true functionally related proteins. With that motivation, we introduce IsoBase, the first publicly available ortholog database that focuses on functionally related proteins. The groupings were computed using the IsoRankN algorithm that uses spectral methods to combine sequence and PPI data and produce clusters of functionally related proteins. These clusters compare favorably with those from existing approaches: proteins within an IsoBase cluster are more likely to share similar Gene Ontology (GO) annotation. A total of 48 120 proteins were clustered into 12 693 functionally related groups. The IsoBase database may be browsed for functionally related proteins across two or more species and may also be queried by accession numbers, species-specific identifiers, gene name or keyword. The database is freely available for download at http://isobase.csail.mit.edu/.National Institute of General Medical Sciences (U.S.) (Grant Number 1R01GM081871)Fannie and John Hertz FoundationNational Science Foundation (U.S.) (NSF MSPRF)National Science Council of Taiwan (NSC99-2218-E-007-010)National Institutes of Health (U.S.) (1R01GM081871

Haemophilus Influenzae Microarrays: Virulence and Vaccines

Author: Akerley
Ali
de Saizieu
Fleischmann
Gmuender
Hood
Hood
J. Simon Kroll
Karp
Paul R. Langford
Tahir R. Ali
Tatusov
Zwahlen
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2002
Field of study

In 1995 the genome sequence of the Haemophilus influenzae KW20 (Rd) strain was published, the first available for a free-living organism. The genome has been invaluable in global strategies to identify certain virulence-related genes, e.g. those involved in LPS synthesis, and also essential genes, but there is a paucity of wholegenome transcriptome studies. We have now constructed a whole-genome array consisting of genes from Rd, additional genes identified in other strains of H. influenzae and controls (from eukaryotic sources and other bacteria). We intend to use this array in studies aimed at understanding the bacterium’s basic metabolism and its response to changing environments; deciphering global regulatory networks (by comparison of wild-type and mutant strains); and identifying genes expressed in vivo. The use of H. influenzae DNA arrays combined with proteomic approaches will enhance our understanding of the metabolism and virulence of the organism. Additionally, the genome sequence of a non-typable H. influenzae strain is in progress. The sequence from this isolate will be invaluable not only in identifying potential novel antibiotic targets and putative vaccine candidates but also in the design of a microarray for genome-typing purposes

Crossref

PubMed Central

Statistically validated networks in bipartite complex systems

Author: A McCallum
AL Barabási
AL Barabási
CM Song
DJ Watts
DY Kenett
Eshel Ben-Jacob
F Reed-Tsochas
F Schweitzer
Fabrizio Lillo
FD Ciccarelli
G Bonanno
J Bascompte
JP Onnela
JP Onnela
Jyrki Piilo
M Girvan
M Rosvall
M Tumminello
M Tumminello
M Tumminello
M Tumminello
MEJ Newman
MEJ Newman
MEJ Newman
Michele Tumminello
R Guimera
RG Miller
RL Tatusov
RL Tatusov
RN Mantegna
Rosario N. Mantegna
S Fortunato
Salvatore Miccichè
V Colizza
W Feller
Y Benjamini
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/08/2010
Field of study

Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. When one constructs a projected network with nodes from only one set, the system heterogeneity makes it very difficult to identify preferential links between the elements. Here we introduce an unsupervised method to statistically validate each link of the projected network against a null hypothesis taking into account the heterogeneity of the system. We apply our method to three different systems, namely the set of clusters of orthologous genes (COG) in completely sequenced genomes [13, 14], a set of daily returns of 500 US financial stocks, and the set of world movies of the IMDb database [15]. In all these systems, both different in size and level of heterogeneity, we find that our method is able to detect network structures which are informative about the system and are not simply expression of its heterogeneity. Specifically, our method (i) identifies the preferential relationships between the elements, (ii) naturally highlights the clustered structure of investigated systems, and (iii) allows to classify links according to the type of statistically validated relationships between the connected nodes.Comment: Main text: 13 pages, 3 figures, and 1 Table. Supplementary information: 15 pages, 3 figures, and 2 Table

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Archivio istituzionale della Ricerca - Scuola Normale Superiore

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

Towards validating the hypothesis of phylogenetic profiling

Author: D Lin
EM Marcotte
J Handl
J Jäkel
J Seo
J Sun
J Sun
J Wu
M Pellegrini
Mazen Atwi
N Bolshakova
P Resnik
R Loganantharaj
Raja Loganantharaj
RL Tatusov
RL Tatusov
SF Altschul
SV Date
Publication venue: BioMed Central
Publication date: 01/11/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

ProOpDB: Prokaryotic Operon DataBase

Author: B. Taboada
C. E. Martinez-Guerrero
Ciccarelli
E. Merino
Eddy
Price
R. Ciria
Serganov
Tatusov
Toledo-Arana
Publication venue: Oxford University Press
Publication date
Field of study

The Prokaryotic Operon DataBase (ProOpDB, http://operons.ibt.unam.mx/OperonPredictor) constitutes one of the most precise and complete repositories of operon predictions now available. Using our novel and highly accurate operon identification algorithm, we have predicted the operon structures of more than 1200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: (i) organism name, (ii) metabolic pathways, as defined by the KEGG database, (iii) gene orthology, as defined by the COG database, (iv) conserved protein domains, as defined by the Pfam database, (v) reference gene and (vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient method to select the most representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool to visualize their genomic context and retrieve the sequence of their corresponding 5′ regulatory regions, as well as the nucleotide or amino acid sequences of their genes

Crossref

PubMed Central

Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes

Author: Alfarano
Alm
B. Khaira
Bateman
Brinkman
Chenna
Choi
Darling
F. S. L. Brinkman
Fulton
G. L. Winsor
Gardy
Karp
Lewenza
Lewenza
Lynn
M. D. Whiteside
Markowitz
Peterson
R. E. W. Hancock
R. Lo
Rey
Stein
T. Van Rossum
Tatusov
Tatusov
Wheeler
Winsor
Publication venue: Oxford University Press
Publication date
Field of study

Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license

Crossref

PubMed Central

Partial Homology Relations - Satisfiability in terms of Di-Cographs

Author: A Brandstädt
AM Altenhoff
AM Altenhoff
AM Altenhoff
C Crespelle
C Dessimoz
DG Corneil
DG Corneil
F Chen
F Gurski
G Östlund
J Engelfriet
J Sukumaran
JG Lawrence
K Hartmann
K Trachana
M Hellmuth
M Hellmuth
M Hellmuth
M Hellmuth
M Lafond
M Lafond
M Lafond
M Lechner
M Lechner
M Ravenhall
R Dondi
RL Tatusov
RM McConnell
S Böcker
WM Fitch
Y Gao
Y Liu
Publication venue
Publication date: 03/05/2018
Field of study

Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-labeled gene tree, i.e., they can simultaneously co-exist in an evolutionary history of the underlying genes. Every gene tree is equivalently interpreted as a so-called cotree that entirely encodes the structure of a di-cograph. Thus, satisfiable homology relations must necessarily form a di-cograph. The inferred homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes, it might be known with a high degree of certainty that they are not orthologs (resp.\ paralogs, xenologs) which yields forbidden pairs of genes. Motivated by this observation, we characterize (partial) satisfiable homology relations with or without forbidden gene pairs, provide a quadratic-time algorithm for their recognition and for the computation of a cotree that explains the given relations

arXiv.org e-Print Archive

Crossref

University of Southern Denmark Research Output